discrete distribution
Instance-Optimal Private Density Estimation in the Wasserstein Distance
Estimating the density of a distribution from samples is a fundamental problem in statistics. In many practical settings, the Wasserstein distance is an appropriate error metric for density estimation. For example, when estimating population densities in a geographic region, a small Wasserstein distance means that the estimate is able to capture roughly where the population mass is. In this work we study differentially private density estimation in the Wasserstein distance. We design and analyze instance-optimal algorithms for this problem that can adapt to easy instances.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- North America > United States > California > Alameda County > Berkeley (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (25 more...)
- North America > United States > California > San Diego County > San Diego (0.05)
- North America > United States > California > San Diego County > La Jolla (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (4 more...)
Sample Complexity of Forecast Aggregation
We consider a Bayesian forecast aggregation model where n experts, after observing private signals about an unknown binary event, report th eir posterior beliefs about the event to a principal, who then aggregates the repor ts into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but th e principal has access to i.i.d. "samples" from the distribution, where each sampl e is a tuple of the experts' reports (not signals) and the realization of the even t. Using these samples, the principal aims to find an ε -approximately optimal aggregator, where optimal-ity is measured in terms of the expected squared distance bet ween the aggregated prediction and the realization of the event.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (4 more...)
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.93)
- Workflow (0.67)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Nebraska > Lancaster County > Lincoln (0.14)
- Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)
- (4 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Nebraska > Lancaster County > Lincoln (0.14)
- Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)
- (4 more...)
Gradient-based Discrete Sampling with Automatic Cyclical Scheduling
Discrete distributions, particularly in high-dimensional deep models, are often highly multimodal due to inherent discontinuities. While gradient-based discrete sampling has proven effective, it is susceptible to becoming trapped in local modes due to the gradient information. To tackle this challenge, we propose an automatic cyclical scheduling, designed for efficient and accurate sampling in multimodal discrete distributions. Our method contains three key components: (1) a cyclical step size schedule where large steps discover new modes and small steps exploit each mode; (2) a cyclical balancing schedule, ensuring balanced proposals for given step sizes and high efficiency of the Markov chain; and (3) an automatic tuning scheme for adjusting the hyperparameters in the cyclical schedules, allowing adaptability across diverse datasets with minimal tuning. We prove the non-asymptotic convergence and inference guarantee for our method in general discrete distributions. Extensive experiments demonstrate the superiority of our method in sampling complex multimodal discrete distributions.
Collaborative Learning of Discrete Distributions under Heterogeneity and Communication Constraints
In modern machine learning, users often have to collaborate to learn distributions that generate the data. Communication can be a significant bottleneck. Prior work has studied homogeneous users---i.e., whose data follow the same discrete distribution---and has provided optimal communication-efficient methods. However, these methods rely heavily on homogeneity, and are less applicable in the common case when users' discrete distributions are heterogeneous. Here we consider a natural and tractable model of heterogeneity, where users' discrete distributions only vary sparsely, on a small number of entries. We propose a novel two-stage method named SHIFT: First, the users collaborate by communicating with the server to learn a central distribution; relying on methods from robust statistics. Then, the learned central distribution is fine-tuned to estimate the individual distributions of users. We show that our method is minimax optimal in our model of heterogeneity and under communication constraints. Further, we provide experimental results using both synthetic data and $n$-gram frequency estimation in the text domain, which corroborate its efficiency.